Skip to content

Improve diff output#163

Merged
alex-slynko merged 2 commits intomainfrom
improve-diff-output
Apr 13, 2026
Merged

Improve diff output#163
alex-slynko merged 2 commits intomainfrom
improve-diff-output

Conversation

@alex-slynko
Copy link
Copy Markdown
Contributor

@alex-slynko alex-slynko commented Apr 13, 2026

Do only changed columns for tables and join changes for same database/table and different clusters

sample output:

https://github.com/github/warehouse-config/pull/4231#issuecomment-4236107408

alex-slynko and others added 2 commits April 13, 2026 11:50
Instead of showing full .create-merge table lines, show only
added/changed/live-only columns. Uses ColumnDiffHelper to compare
column dictionaries and render a concise diff.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When multiple clusters have the same changes, show them once
with a combined header instead of repeating the full diff for
each cluster. Uses a canonical fingerprint (markdown + comments +
validity) to detect identical diffs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 13, 2026 11:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the generated diff markdown by (1) showing more focused, column-level diffs for table .create-merge changes and (2) grouping clusters that have identical diff output into a single combined markdown section.

Changes:

  • Added cluster grouping logic in markdown generation by computing a per-cluster fingerprint and merging identical outputs.
  • Introduced ColumnDiffHelper and updated ScriptCompareChange to render column-level diffs for CreateMergeTable changes (fallback to line diff otherwise).
  • Added unit tests for column diff rendering and cluster fingerprint grouping.
Show a summary per file
File Description
KustoSchemaTools/KustoSchemaHandler.cs Computes per-cluster fingerprints and groups identical cluster diffs into a combined markdown header/output.
KustoSchemaTools/Changes/ScriptCompareChange.cs Adds a column-diff rendering path for CreateMergeTable and refactors line-diff rendering into a helper method.
KustoSchemaTools/Changes/ColumnDiffHelper.cs New helper that produces a concise column-level diff (added/type-changed + note for live-only columns).
KustoSchemaTools.Tests/Changes/ColumnDiffHelperTests.cs Unit tests validating column diff output for added/type-changed/removed/no-change cases.
KustoSchemaTools.Tests/Changes/ClusterGroupingTests.cs Unit tests validating cluster fingerprint equality/inequality behavior across scenarios.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 4

Comment on lines +250 to +253
foreach (var comment in comments.OrderBy(c => c.Kind).ThenBy(c => c.Text))
{
sb.Append($"comment:{comment.Kind}:{comment.FailsRollout}:{comment.Text};");
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildClusterFingerprint constructs the grouping key by concatenating raw comment.Text (and later change.Markdown) with :/; delimiters. If any comment/markdown contains these delimiter sequences (e.g. ";change:"), two different diffs can produce the same fingerprint and get incorrectly grouped. Consider generating the fingerprint from a structured representation (e.g., JSON) and hashing it (SHA256), or length-prefix/escape the fields before concatenation.

Copilot uses AI. Check for mistakes.
Comment on lines +255 to +258
foreach (var change in changes)
{
sb.Append($"change:{change.Markdown};");
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildClusterFingerprint appends full change.Markdown into the fingerprint string. For large diffs and/or many clusters this can create very large strings and increase memory/CPU usage during grouping. A more efficient approach is to hash each change/comment payload (or stream into a hasher) and group on the fixed-size hash.

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +37
.Where(c => !oldColumns.ContainsKey(c.Key))
.ToList();

var typeChanged = newColumns
.Where(c => oldColumns.ContainsKey(c.Key) && !string.Equals(oldColumns[c.Key], c.Value, StringComparison.OrdinalIgnoreCase))
.Select(c => new { Name = c.Key, OldType = oldColumns[c.Key], NewType = c.Value })
.ToList();

var removedFromYaml = oldColumns
.Where(c => !newColumns.ContainsKey(c.Key))
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildColumnDiff enumerates newColumns/oldColumns directly when rendering added, typeChanged, and removedFromYaml. Since these are Dictionary instances, enumeration order can vary depending on how the dictionaries were constructed, which can make the rendered diff (and cluster grouping fingerprint) non-deterministic. Sorting by column name before rendering (and sorting removed columns) would make the output stable across runs/clusters.

Suggested change
.Where(c => !oldColumns.ContainsKey(c.Key))
.ToList();
var typeChanged = newColumns
.Where(c => oldColumns.ContainsKey(c.Key) && !string.Equals(oldColumns[c.Key], c.Value, StringComparison.OrdinalIgnoreCase))
.Select(c => new { Name = c.Key, OldType = oldColumns[c.Key], NewType = c.Value })
.ToList();
var removedFromYaml = oldColumns
.Where(c => !newColumns.ContainsKey(c.Key))
.Where(c => !oldColumns.ContainsKey(c.Key))
.OrderBy(c => c.Key, StringComparer.Ordinal)
.ToList();
var typeChanged = newColumns
.Where(c => oldColumns.ContainsKey(c.Key) && !string.Equals(oldColumns[c.Key], c.Value, StringComparison.OrdinalIgnoreCase))
.OrderBy(c => c.Key, StringComparer.Ordinal)
.Select(c => new { Name = c.Key, OldType = oldColumns[c.Key], NewType = c.Value })
.ToList();
var removedFromYaml = oldColumns
.Where(c => !newColumns.ContainsKey(c.Key))
.OrderBy(c => c.Key, StringComparer.Ordinal)

Copilot uses AI. Check for mistakes.
int last = 0;
for (int i = 0; i < relevantLines.Count; i++)
{
var b = i - 1 > 0 ? relevantLines[i - 1] : null;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In RenderLineDiff, the previous-line lookup uses i - 1 > 0, which skips index 0 when i == 1. This makes the neighbor-context logic inconsistent (line 0 is never considered as the previous line for line 1). Use i > 0 (or i - 1 >= 0) for the boundary check.

Suggested change
var b = i - 1 > 0 ? relevantLines[i - 1] : null;
var b = i > 0 ? relevantLines[i - 1] : null;

Copilot uses AI. Check for mistakes.
@alex-slynko alex-slynko merged commit 9587667 into main Apr 13, 2026
9 checks passed
@alex-slynko alex-slynko deleted the improve-diff-output branch April 13, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants